随着自动驾驶行业正在缓慢成熟,视觉地图本地化正在迅速成为尽可能准确定位汽车的标准方法。由于相机或激光镜等视觉传感器返回的丰富数据,研究人员能够构建具有各种细节的不同类型的地图,并使用它们来实现高水平的车辆定位准确性和在城市环境中的稳定性。与流行的SLAM方法相反,视觉地图本地化依赖于预先构建的地图,并且仅通过避免误差积累或漂移来提高定位准确性。我们将视觉地图定位定义为两个阶段的过程。在位置识别的阶段,通过将视觉传感器输出与一组地理标记的地图区域进行比较,可以确定车辆在地图中的初始位置。随后,在MAP指标定位的阶段,通过连续将视觉传感器的输出与正在遍历的MAP的当前区域进行对齐,对车辆在地图上移动时进行了跟踪。在本文中,我们调查,讨论和比较两个阶段的基于激光雷达,基于摄像头和跨模式的视觉图本地化的最新方法,以突出每种方法的优势。
translated by 谷歌翻译
点云语义分段由于其对光线的稳健性而引起了注意。这使其成为自动驾驶的理想语义解决方案。但是,考虑到神经网络的巨大计算负担和带宽的要求,将所有计算都放入车辆电子控制单元(ECU)不高度或实用。在本文中,我们根据范围视图提出了一个轻巧的点云语义分割网络。由于其简单的预处理和标准卷积,在像DPU这样的深度学习加速器上运行时,它是有效的。此外,为自动驾驶汽车构建了近传感器计算系统。在该系统中,放置在LIDAR传感器旁边的基于FPGA的深度学习加速器核心(DPU),以执行点云预处理和分割神经网络。通过仅将后处理步骤留给ECU,该解决方案大大减轻了ECU的计算负担,因此缩短了决策和车辆反应潜伏期。我们的语义分割网络在Xilinx DPU上获得了10帧(FPS),其计算效率为42.5 GOP/w。
translated by 谷歌翻译
深度学习为许多计算机视觉任务提供了一种强大的新方法。来自航空图像的高度预测是那些从替代旧多视图几何技术的深度学习的部署大大受益的任务之一。这封信提出了一种两级方法,其中首先是多任务神经网络用于预测由单个RGB空中输入图像产生的高度图。我们还包括第二种细化步骤,其中用于产生更高质量的高度图。两个公开数据集的实验表明我们的方法能够产生最先进的结果。代码可在https://github.com/melhousni/dsmnet上获得。
translated by 谷歌翻译
Roadheader是一款在地下工程和采矿行业中广泛使用的工程机器人。 Roadheader的交互式动力学模拟是无人发掘和虚拟现实训练中的一个基本问题。但是,当前的研究仅基于传统的动画技术或商业游戏引擎。很少有研究将计算机图形的实时物理模拟应用于Roadheader机器人领域。本文旨在介绍一个基于物理的式型型式机器人的模拟系统。为此,提出了基于广义坐标的改进的多体模拟方法。首先,我们的仿真方法描述了基于广义坐标的机器人动力学。与最新方法相比,我们的方法更稳定和准确。数值仿真结果表明,在相同数量的迭代中,我们的方法的错误明显少于游戏引擎。其次,我们对动态迭代采用符号欧盟积分器,而不是传统的四阶runge-kutta(RK4)方法。与其他集成剂相比,在长期模拟过程中,我们的方法在能量漂移方面更加稳定。测试结果表明,我们的系统达到了每秒60帧(FPS)的实时交互性能。此外,我们提出了一种模型格式,用于实施该系统的路障机器人建模。我们的Roadheader的交互式模拟系统满足了交互,准确性和稳定性的要求。
translated by 谷歌翻译
点云的不规则性和混乱为点云分析带来了许多挑战。 PointMLP表明几何信息不是点云分析中唯一的关键点。它基于使用几何仿射模块的简单多层感知(MLP)结构实现了有希望的结果。但是,这些类似MLP的结构仅具有固定权重的聚合特征,而不同点特征的语义信息的差异被忽略。因此,我们提出了点特征的新的点矢量表示,以通过使用电感偏置来改善特征聚集。引入矢量表示的方向可以根据语义关系动态调节两个点特征的聚合。基于它,我们设计了一个新颖的Point2vector MLP体系结构。实验表明,与先前的最佳方法相比,它在ScanoBjectNN数据集的分类任务上实现了最新的性能,增加了1%。我们希望我们的方法可以帮助人们更好地了解语义信息在点云分析中的作用,并导致探索更多更好的特征表示或其他方式。
translated by 谷歌翻译
越来越多的东西数量(物联网)设备使得必须了解他们在网络安全方面所面临的真实威胁。虽然蜜罐已经历史上用作诱饵设备,以帮助研究人员/组织更好地了解网络的威胁动态及其影响,因此由于各种设备及其物理连接,IOT设备为此目的构成了独特的挑战。在这项工作中,通过在低互动蜜罐生态系统中观察真实世界攻击者的行为,我们(1)我们(1)介绍了创建多阶段多方面蜜罐生态系统的新方法,逐渐增加了蜜罐的互动的复杂性有了对手,(2)为相机设计和开发了一个低交互蜜罐,允许研究人员对攻击者的目标进行更深入的了解,并且(3)设计了一种创新的数据分析方法来识别对手的目标。我们的蜜罐已经活跃三年了。我们能够在每个阶段收集越来越复杂的攻击数据。此外,我们的数据分析指向蜜罐中捕获的绝大多数攻击活动共享显着的相似性,并且可以集聚集和分组,以更好地了解野外物联网攻击的目标,模式和趋势。
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译
Supervised Question Answering systems (QA systems) rely on domain-specific human-labeled data for training. Unsupervised QA systems generate their own question-answer training pairs, typically using secondary knowledge sources to achieve this outcome. Our approach (called PIE-QG) uses Open Information Extraction (OpenIE) to generate synthetic training questions from paraphrased passages and uses the question-answer pairs as training data for a language model for a state-of-the-art QA system based on BERT. Triples in the form of <subject, predicate, object> are extracted from each passage, and questions are formed with subjects (or objects) and predicates while objects (or subjects) are considered as answers. Experimenting on five extractive QA datasets demonstrates that our technique achieves on-par performance with existing state-of-the-art QA systems with the benefit of being trained on an order of magnitude fewer documents and without any recourse to external reference data sources.
translated by 谷歌翻译
Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation-invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.
translated by 谷歌翻译
Knowledge graph embedding (KGE), which maps entities and relations in a knowledge graph into continuous vector spaces, has achieved great success in predicting missing links in knowledge graphs. However, knowledge graphs often contain incomplete triples that are difficult to inductively infer by KGEs. To address this challenge, we resort to analogical inference and propose a novel and general self-supervised framework AnKGE to enhance KGE models with analogical inference capability. We propose an analogical object retriever that retrieves appropriate analogical objects from entity-level, relation-level, and triple-level. And in AnKGE, we train an analogy function for each level of analogical inference with the original element embedding from a well-trained KGE model as input, which outputs the analogical object embedding. In order to combine inductive inference capability from the original KGE model and analogical inference capability enhanced by AnKGE, we interpolate the analogy score with the base model score and introduce the adaptive weights in the score function for prediction. Through extensive experiments on FB15k-237 and WN18RR datasets, we show that AnKGE achieves competitive results on link prediction task and well performs analogical inference.
translated by 谷歌翻译